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Abstract 


Algorithms that list graphs such that no two listed graphs are isomorphic, are important building 
blocks of systems for mining and learning in graphs. Algorithms are already known that solve this 
problem efficiently for many classes of graphs of restricted topology, such as trees. In this article 
we introduce the concept of a dense augmentation schema, and introduce an algorithm that can be 
used to enumerate any class of graphs with polynomial delay, as long as the class of graphs can be 
described using a monotonic predicate operating on a dense augmentation schema. In practice this 
means that this is the first enumeration algorithm that can be applied theoretically efficiently in any 
frequent subgraph mining algorithm, and that this algorithm generalizes to situations beyond the 
standard frequent subgraph mining setting. 
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1. Introduction 


Among the most prominent graph mining problems is the problem of finding frequent subgraphs 
in databases of small graphs of any topology. This is witnessed by the large number of algorithms 
that have been proposed for this task (Yan and Han, 2002; Borgelt and Berthold, 2002; Kuramochi 
and Karypis, 2004; Inokuchi et al., 2003; Inokuchi, 2004; Huan et al., 2003; Nijssen and Kok, 
2004; Leskovec et al., 2006). A fundamental problem that is addressed in all these works is how to 
enumerate a set of graphs such that no two graphs in the enumerated set are isomorphic with each 
other. The main motivation for this focus is that if duplicates would not be avoided, these algorithms 
would access the data more often than necessary and produce results that are larger than required. 


To avoid isomorphic graphs in their output, all these existing graph mining algorithms use a 
methodology based on canonical codes. A canonical code is a code that uniquely identifies a set 
of isomorphic graphs. To determine if a graph should be part of the output, its canonical code is 
computed, and, in some algorithms, compared with the canonical codes of graphs found before. 


A fundamental problem with the canonical code based approach, however, is that we essentially 
need to solve the graph isomorphism problem: if we could compute the canonical code of any 
graph efficiently, we could compute the codes of two graphs to determine if they are isomorphic. 
The state of the art is that no polynomial algorithm for the graph isomorphism problem is known. 
Consequently, it can be shown that when the existing graph mining algorithms are enumerating 
candidate subgraphs, the delay between two enumerated graphs is exponential in the worst case (in 
terms of the size of the largest graph enumerated). 
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The contribution of this article is that we introduce a novel algorithm for enumerating graphs 
that does not use canonical codes, and incrementally maintains data structures that ensure that no 
two isomorphic graphs are listed. We show that in contrast to other algorithms for enumerating 
graphs, this new algorithm outputs many classes of graphs, including arbitrary connected graphs, 
with polynomial delay, which makes this algorithm theoretically more efficient than any other graph 
enumeration algorithm used in the graph mining literature. 


It is important to note that our algorithm works for many classes of graphs. If we would restrict 
the topology of the graphs, for instance, to only those graphs that are trees, the enumeration problem 
of frequent graph mining is already known to be more efficiently solvable, and algorithms are known 
(Wright et al., 1986; Nakano and Uno, 2004) and used in practice (Chi et al., 2005; Horvath et al., 
2006). 


Even though the frequency constraint is the most popular constraint in the graph mining litera- 
ture, other constraints have been studied as well. An important property of the frequency constraint 
is that it is monotonic w.r.t. to subgraph isomorphism: if a graph is frequent, all its subgraphs are 
also frequent. In our algorithm we exploit this property to maintain data structures incrementally. 
An interesting question is to what extent enumeration with polynomial delay is feasible when the 
graphs to enumerate are not monotonic under the subgraph isomorphism relation. To this aim, we 
developed the concept of an augmentation schema. The augmentation schema defines relations be- 
tween graphs in the space of graphs to enumerate (in the simplest case, the subgraph isomorphism 
relation). We will show that enumeration with polynomial delay is possible as long as an aug- 
mentation schema satisfies certain conditions, and the graphs to enumerate can be specified using a 
monotonic predicate w.r.t. the augmentation schema. We will specify our algorithm in terms of such 
augmentation schemas. This makes our method general enough to be applied in settings beyond the 
traditional frequent subgraph mining setting, and allows us also to enumerate both connected and 
unconnected graphs. For instance, we can also enumerate hereditary classes of graphs with bounded 
degree; a class of graphs is called hereditary if it is monotonic under the induced subgraph relation, 
instead of the traditional subgraph relation. 


The problem of graph enumeration has not only been studied in the graph mining literature. In 
particular, Goldberg showed in the early nineties that there is a polynomial delay algorithm to list 
all graphs (Goldberg, 1992). We will provide more details about this algorithm in Section 3, where 
we will show that this algorithm cannot be used to list graphs that satisfy a monotonic predicate, as 
required in a graph mining setting. Many algorithms exist for enumerating classes of graphs without 
taking into account isomorphisms, such as classes of graphs described by first order logic formulas 
(Goldberg, 1993) and edge-maximal graphs with bounded branchwidth (Paul et al., 2006); it is not 
known how to list these classes while taking into account isomorphisms. Heuristic implementations 
exist for enumerating graphs in general (McKay, 1998), but these do not guarantee polynomial 
delay. 


Our algorithm uses similar ideas as the algorithm of Goldberg (1992). In particular, as our algo- 
rithm maintains a data structure incrementally, our algorithm requires that all computed subgraphs 
are stored. In the pattern mining setting, where we are interested in finding these graphs, this is a 
common assumption. 


This article is the full version of a workshop abstract (Ramon and Nijssen, 2007). Compared to 
the workshop abstract, in this article (1) we show how our method extends to other classes of graphs 
than connected graphs and (2) we provide full details and proofs. 
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The article is organized as follows. In Section 2 we introduce the problem of subgraph mining. 
In Section 3 we show why the algorithm of Goldberg is too limited for applications in graph mining. 
In Section 4 we introduce the concept of augmentation schemas and formally define the enumeration 
problems that we are addressing. In Section 5 we state our results. In Section 6 we provide a 
short introduction to concepts in group theory which we need in Section 7, where we outline our 
algorithm; Section 8 concludes. The proofs of our claims are given in an appendix. 


2. Motivation 


The main motivation for our work is the problem of efficiently mining subgraphs under constraints. 
The most common such problem is the problem of mining frequent subgraphs in a database of small 
graphs. We will first give a formal definition of this problem. 


A graph g is a tuple (V, E) where V is a set of vertices and E CV x V is a set of edges. We denote 
with V(g) the set of vertices and with E(g) the set of edges of a graph g. In this article we restrict 
ourselves to unlabeled, simple graphs (i.e., undirected, unweighted, no loops, no multiple edges 
between two nodes). It easy to lift these restrictions. In particular, in frequent subgraph mining it 
is usually assumed that graphs have labels. However, our discussion is simplified by assuming that 
we do not have labels; this is not a fundamental restriction of our methodology. 


There are many ways in which one can restrict the topology of graphs. For instance, a path is 
a graph in which all nodes have degree 2, except two nodes, which have degree one. A tree is a 
connected graph with k nodes and k — 1 edges. When we use the word graph, we refer to graphs 
that have no apriori restriction on their topology (except being unlabeled and simple). 


Between two graphs we can define the graph isomorphism and the subgraph isomorphism rela- 
tions. Our definitions are as usual in the literature: two graphs gı and g are isomorphic iff there is 
a bijection @ : V (g1) — V (g2) such that (v1,v2) E€ E(g1) © (@(v1), P(v2)) E€ E(g2). We denote this 
by gı ~ọ 82, where @ is the bijection between the graphs. The bijection can be omitted if this is 
clear from the context. 

A graph gı is subgraph isomorphic to go iff there is a subgraph (V',E’) with V’ C V (g2) and 
E' C E(g2), such that gı is isomorphic with (V’,E’). This is denoted with g1 Xo g2, where @ is the 
bijection between the nodes of gı and the subset of nodes of g2. A subgraph (V’,E’) of g2 is an 
induced subgraph if for all v,v’ € V' : {v,v'} € E(g2) > {v,v'} € E’; in other words, all edges in g2 
between nodes in V’ are also present in E’. A graph g1 is induced subgraph isomorphic to g2 iff gı 
is isomorphic with an induced subgraph of go. 

The graph isomorphism and subgraph isomorphism problems should not be confused with each 
other. The subgraph isomorphism problem is known to be NP complete, while the graph isomor- 
phism problem is believed to be in a complexity class of its own. For both problems in general no 
polynomial algorithm is known (K6bler et al., 1993). 

The problem of frequent subgraph mining can now be formalized as follows. Given is a database 
of graphs, DB = {g1,82,...,8n}, and a threshold t. Then we are interested in finding all graphs g 
for which the support is higher than or equal to t. The support of a graph g is the number of graphs 
in DB with which g is subgraph isomorphic. 

This frequent subgraph mining problem can be generalized by replacing the minimum frequency 
constraint with other predicates. For instance, a predicate could involve an additional maximum size 
constraint. 
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A predicate on graphs is called monotonic if all subgraphs of a graph that satisfies the predicate, 
will also all satisfy the predicate.! The support constraint and the maximum size constraint are 
examples of predicates that are monotonic under subgraph isomorphism. 

The problem of constraint-based subgraph mining is closely related to the problem of frequent 
item set mining. Many algorithms have been developed to tackle the frequent item set mining 
problem, the most well-known being the APRIORI algorithm (Agrawal et al., 1996). Both frequent 
graph mining algorithms and frequent item set mining algorithms are considered to be constraint- 
based pattern mining algorithms. Constraint-based pattern mining algorithms look for patterns in a 
pattern language L, and assume that these patterns are ordered using a partial order relation < on 
L. In the case of graph mining, < is usually the subgraph isomorphism relation. 

Many algorithms for pattern mining are level-wise (breadth-first) enumeration algorithms. These 
algorithms assume that the size of a pattern in the language is well-defined, and look for the pat- 
terns by listing them increasing in size. A high-level description of such an algorithm is given in 
Algorithm 1. 


Algorithm 1 Level-Wise Pattern Miner 





Require: A pattern language £ and a monotonic predicate p 
Ensure: output all g € £ with p(g) 


1: Cı — patterns of size 1 

2: k1 

3: while CŒ #0 do 

4. Fe {8 E€ Glp(g)} 

5: Generate C,,1 from Fk 
6 kek+1 

7: end while 

8: Output U; Fk 





In this algorithm, Fp contains the patterns of size k that satisfy the predicate. In line 4 it is 
determined which candidates of size k satisfy the predicate. In frequent pattern mining, this line 
requires access to the data, and can be most time consuming. It is therefore essential that Cy be as 
small as possible. 

The main focus of this article is on the computation that needs to be performed in line 5. In this 
line new candidates should be generated. This generation should ensure the following: 


e by repeatedly generating new candidates we should be able to enumerate all patterns in the 
pattern space, in our case the space of all unlabeled, simple graphs; 


e to ensure that the algorithm is as efficient as possible, we should not insert two patterns in 
C.+1 that are equivalent with each other; in our case, we should avoid inserting two graphs 
that are isomorphic; 


e we should not insert patterns in (4, for which we can know beforehand that p will not be 
true; in our case, we should exploit the monotonicity of p to avoid inserting graphs of which 
a subgraph is not included in F. 





1. We adopt here the terminology most common in graph theory. Some authors in the data mining literature use the 
term ‘anti-monotonic’. 
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In the graph mining setting, the second and third requirements are difficult, as the second require- 
ment requires us to solve a graph isomorphism problem, and the third requirement involves a sub- 
graph isomorphism problem. 

Algorithm 1 has applications beyond traditional frequent subgraph mining. For instance, if we 
are interested in computing a decomposition graph kernel between two graphs which counts the 
number of non-isomorphic subgraphs that two graphs have in common, we could compute this 
kernel by providing algorithm 1 a database of two graphs as input and a threshold of t = 2. The size 
of the output is the desired kernel value. 

Similarly, we could be interested in enumerating all different graphs that include one node in a 
network (Leskovec et al., 2006). In this case, the input of Algorithm 1 consists of one graph with 
all nodes up to a certain threshold distance from the node of interest, and the subgraph isomorphism 
should be restricted such that only bijections are considered in which at least one node in the pattern 
is mapped to the special node in the data. 

In all cases, the essential problem of enumerating graphs without duplicates remains. Several 
algorithms have been proposed in the literature to address this graph enumeration problem. The 
main idea that has been employed, is that for every graph, we can compute a canonical code, that is, 
a code that is unique for all graphs that are isomorphic. The level-wise graph miners AGM (Inokuchi 
et al., 2003) and FSG (Kuramochi and Karypis, 2004) define a canonical code from adjacency 
matrices. Essentially, all subgraphs are stored in a data structure that is indexed according to this 
canonical code, and duplicates are avoided by computing for every candidate the canonical code. 
The approaches of AGM and FSG differ in their definition of size: AGM grows graphs by adding 
nodes, FSG by adding edges. 

Other graph miners search depth-first, but their enumeration strategy can easily be modified for 
use in a level-wise algorithm (Yan and Han, 2002; Huan et al., 2003; Nijssen and Kok, 2004). Also 
these algorithms use a canonical code, but do not require the use of an indexed data structure. An 
algorithm for enumerating graphs surrounding a node in a network was proposed by Leskovec et al. 
(2006). Again, this algorithm used a canonical code. 

Unfortunately, currently no polynomial algorithm is known to compute a canonical code; if one 
was known, we would be able to solve the graph isomorphism problem in polynomial time. Overall, 
this means that in all existing graph mining algorithms exponential time can be spent between two 
graphs that are inserted in the set of candidates. 

Enumeration algorithms for which this is not the case, that is, algorithms that solve an enumera- 
tion problem such that between any two enumerated solutions polynomial time is spent (in terms of 
the largest enumerated solution), are known as algorithms with polynomial delay. To the best of our 
knowledge, all algorithms that have been proposed in the graph mining literature for enumerating 
graphs in general do not have polynomial delay. Only for restricted classes of graphs, such as trees 
and outerplanar graphs, algorithms with polynomial delay are known (Chi et al., 2005; Horvath 
et al., 2006). 

However, the fact that graph isomorphism is not known to be polynomially computable, does 
not imply that graph enumeration cannot be solved with polynomial delay. Even though ignored 
in the data mining and machine learning literature, a polynomial algorithm for enumerating graphs 
does exist and was proposed by Goldberg (1992). 

The problem with this algorithm is that it solves a rather simple enumeration problem: given 
a bound on the size of the graphs to enumerate, Goldberg’s algorithm lists all graphs of this size 
with polynomial delay. In the case of data mining and machine learning, we are dealing with more 
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complicated monotonic constraints that are data-dependent. We will show in the next section that we 
can create databases such that the set of graphs to enumerate does not fulfill the basic assumptions 
that need to be satisfied in Goldberg’s algorithm. Even worse, we will see that this type of data is 
very common. 

It should be stressed that this paper only studies the candidate generation of graph mining al- 
gorithms; it does not study the frequency evaluation. For general graphs the frequency evaluation 
also takes exponential time; a general frequent graph miner which uses our enumeration algorithm 
will still have exponential delay due to the fact that frequency evaluation is still exponential. This 
article proposes an improvement only of the candidate generation phase. The key insight is that we 
devised a graph enumeration algorithm which does not use canonical codes to perform this task. 


3. Goldberg’s Algorithm 


In this section we briefly discuss the key points in the algorithm of Goldberg (1992), which shows 
why this algorithm cannot be used in a pattern mining setting. 

Goldberg’s algorithm aims at listing all graphs with n nodes, and makes a distinction between 
easy and hard graphs. Easy is a graph g that satisfies at least one of these two properties: 


e g has a vertex with degree n — 1, that is, at least one vertex is connected to all other vertices; 


e g has only one vertex, say v, of maximum degree and g — v is rigid, that is, the graph g — v has 
only one isomorphism with itself (called the identity automorphism in Section 6). 


An example of a graph that is never rigid, is a path. 
Let E(n) be the set of easy graphs with n nodes, and U(n) the set of all graphs with n nodes, 
then it was shown by Goldberg that 
2|E(n)| > |U (n)|. 


This property implies that a large fraction of the graphs to enumerate are in fact easy. It was then 
shown that E (n) can be listed with polynomial delay, and that H (n) = U (n) \ E(n) can be listed in 
O(n*|U(n)|) time steps, where |U (n)| is exponential in n but linear in the number of solutions. The 
main idea is then to interleave these two methods. The method which lists easy graphs, makes sure 
that the delay is polynomial. The other method is allowed to spend an exponential number of steps 
between consecutive graphs, but these steps are spread over several iterations of the method that 
lists easy graphs. Effectively this gives an algorithm with polynomial delay. 

It is clear that this method fundamentally relies on the property that many graphs are ‘easy’. 
This property does not hold for sets of graphs defined by a monotonic predicate. Let us illustrate 
this for the monotonic constraint that every node in a graph has a degree of at most three. If n > 3, 
it is easily seen that 


e as the degree is at most 3, the number of graphs that contain a vertex that is connected to all 
other vertices is independent of n; 


e every graph g that contains a single node v of maximum degree 3, consists, after removal of 
v, only of a set of (possibly unconnected) paths, hence g — v is not rigid. 


Consequently, E(n) is a constant independent of n, while U (n) grows with n. The condition of 
Goldberg’s approach is therefore not satisfied for this class of graphs. 
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Moreover, to list all graphs in H(n) in time O(n*|U(n)|), it is assumed that the average size of 
the automorphism groups of the elements of U(n) is bounded. However, one can find subclasses for 
which this bound is not polynomial. 

The most popular application of graph mining algorithms is in chemistry (Horvath et al., 2006). 
Most of the graphs in these databases have a degree bounded by four, and a majority of the subgraphs 
that need to be enumerated have a degree bounded by three. Thus, we do not believe that the 
conditions for Goldberg’s method are satisfied in such data. 


4. Problem Statement 
Our problem setting has two parameters: 


e an augmentation operator, which takes as input a graph, and outputs a set of augmentations of 
this graph, and whose closure, starting from a given set of graphs, describes a class of graphs; 


e a predicate which restricts this class. 


For instance, the augmentation operator can be used to describe the class of connected or uncon- 
nected graphs, while the boolean predicate can restrict this class further to those graphs that have 
bounded degree. 

More formally, we will denote by VE the set that contains all pairs (ry,rg) where ry is a 
set of vertices and rg a set of edges (not necessarily between vertices in ry). We will use set 
operators on elements of VE to denote the corresponding operations on their components, for 
example, (rv,re) U (ri, rh) = (rv Ury, re Urh). Again, V(r) and E(r) describe the components of 
an element r € VE. An augmentation operator p* is a function that takes as input a graph, and 
outputs a set of descriptions of possible augmentations. This set is a subset of VE. Every element 
r € p*(g) describes a new graph (V(g) UV(r),E(g) UE(r)), abbreviated by g +r, that we call a 
child of g. 

An example of an augmentation operator is 


pi (g) = {({Vnew}s {{Y; Vnew }}) |v E V(g) fs 


where Vnew is a new vertex (not belonging to V(g)). This operator adds a new vertex and connects 

it to an existing vertex. We can use this operator to describe the set of all (connected) trees. The 

minimal graph on which we apply the operator is in this case the graph with one node T; = ({v},0); 

in general, when we use one graph as the minimal element, we will denote this initial graph with T. 
The following operator enumerates all graphs: 


Pa (8) = {({Ynew}, 0) }U LO, {{v1,v2}} lvi; v2 E€ V(g) A {71:12} Z E(8)}. (1) 
with T, = (0,0), while the following allows for enumerating all connected graphs: 
Po (8) = pi (8) U {(0, {{riva}} lvi, v2 €V(g) A (v1, v2} Z E(g)}- (2) 


with Te = ({v},0). 

As we can see in these examples, the vertices occurring in edges E (r) do not have to occur in 
V(r). Still it is useful to determine the entire set of vertices involved in an augmentation. For this 
we use the notation V*, that is, V*(r) = V(r) U{v | de € E(r) : v € e}. Observe that the example 
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operators output a number of augmentations that is bounded by a polynomial in the size of g, and 
that for each r, the size of the set V*(r) is bounded by a constant. 

The class of graphs defined by taking the closure of the augmentation operator on the minimal 
element is denoted by £)+ (which we shorten further to La and Le for the classes defined by p$ 
and p7). The operator p* defines an ancestry relation between the graphs. This relation is a partial 
order. 

The second parameter of our problem setting is a predicate p on graphs. The set of graphs g in 
Lp+ such that p(g) is true is denoted by Lp+ p. We only consider predicates that cannot distinguish 
between isomorphic graphs, that is, if g ~ g’ then p(g) = p(g’). We call a predicate monotonic w.r.t. 
an augmentation operator p* if for every graph g € Lp+ p it also holds that g’ € Lp+ p for every 
g’ that is an ancestor of g. For instance, the predicate that tests if a graph has bounded degree, is 
monotonic under p7 as defined in (1). 

In this article, we consider the following problem. 


Problem 1 Given are an augmentation operator p+ and a predicate p which is monotonic w.rt. 
p*. Then, enumerate all elements in Ly+ p such that exactly one representative of every equivalence 
class under isomorphism of Ly+ p is enumerated. 


In the next section we determine a set of sufficient conditions on the augmentation operator and the 
monotonic predicate that have to be fulfilled in order to obtain an algorithm with polynomial delay. 


5. Main Result 


The augmentation operator that we introduced in the previous section, generates the children of a 
graph. Our algorithm relies on the existence of an operator which can inverse this operator. We call 
this operator a reduction operator. The reduction operator generates the parents of a graph. 

Formally, the definition of a reduction operator is similar to that of an augmentation operator; 
the input of a reduction operator p~ is a single graph, its output consists of a subset of VE. We 
call each element r € p7 (g) a reduction of g. It defines a graph (V (g) \V(r),E(g) \ E(r)), which is 
abbreviated by g — r. 

For instance, in the case of connected graphs, we can define the following reduction operator: 


p-(g) = {r|r= (0,{{v1,v2}}) A {v1, v2} € E(g) A {v1, v2} is in a cycle} 
{r|r = ({v1}, {{1, v2} }) A {v1, v2} E€ E(g) Avı has degree 1} (3) 


Let L be a class of graphs. Then, an augmentation schema on £ is a pair (p*,p~) of an 
augmentation operator p* and a reduction operator p~, such that 


e Vg € L,YrEp™(g):g28+r € LAgnr= (0,0), that is, p+ (g) contains augmentations that can 
be added to g to obtain a larger graph (child); 


e Vg E€ L,YrE p~ (g):g—r E€ LAr Cg, that is, p~ (g) contains reductions that can be removed 
from g to obtain a parent; 


e Yg € L,Yr € p7 (g):r € p~ (g+r), that is, the effects of the additions r € pt (g) can be 
inverted by a deletion from p~ (g +r); 
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e VgELvrep (g):4r Ept(g—r), 39: ((g—r) +r ~o 8) A (Ig-r C Q), that is, deletions 
r€p (g) can be inverted by additions from p*(g — r). Here J,_, = {(v,v) |v E V(g —r)} is 
the identity permutation over the vertices of g — r; 


e Vg1,g2 E L: 81 Yo 82 > Vr E€ p7 (g1) : P(r) € P* (gz), that is, p+ (and hence also p~ (g)) is 
invariant to isomorphisms. 


Given a graph g and two reductions r;,r2 € p7 (g), we are interested in applying both rı and r2 
to g. However, sometimes this is not possible directly. Consider for instance the class of connected 
graphs £e, the graph g = ({1,2,3},{{1,2}, {2,3}, {3,1}}), and the reductions rı = (0, {{1,2}}) 
and r2 = (0, {{2,3}}). We have fi = g — rı = ({1,2,3}, {{2,3}, {3, 1} }). We cannot apply r2 to fi 
as this would result in a graph which is not in £, due to the isolated node 2; rz is not an allowed 
reduction in g — rı. We can however map the reduction rz to a reduction that is allowed; instead of 
r2 we use ({2}, {{2,3}}), which is a valid deletion from fı. This translation of rz to the context 
of g — rı for p} is denoted by r2 ie rı. More formally, for connected graphs Le, we define rz fe rı 
to be equal to r2, except in the case where rı = (0, {v, u1 }), r2 = (0, {v,u2}) and v has degree 2, in 
which case rz ÌE rı = r2 U ({v},0). Then, (g — r1) — (r2 TE ri) = ({1,3}, {{1,3}}). 

We now more formally introduce the properties of these reduction translators. A reduction 
translator is an operator - |’ - mapping graphs g and reductions r1,r2 € p7 (g) to a new reduction 
r2 ÎE rı satisfying 

Yg ri r:r Crm ten, (4) 


that is, after the translator is applied the reduction can only become larger, 
Vg, ri, r2: r1 N (r2 ÎE r1) = (0,0), (5) 
that is, no element is removed twice, and 
Vg, ri r : (r2 ?F ri) Uri = (r1 1? n) Um, (6) 


that is, reversing the order does not affect which vertices and edges are removed. 

To allow for an efficient enumeration of graphs by our algorithm, we require that the augmenta- 
tion schema of the class of graphs fulfills a density property. This property is based on the graph of 
parents. The graph of parents GoP,-_;(g) of a graph g is defined as follows: 


e V(GoP,-;(g)) =p (g), that is, every reduction for g corresponds to a vertex in its graph of 
parents. 
e For any two r1,r2 E€ V(GoP,-;(g)), the graph GoP,- ;(g) has edge {r1,r2} iff (r1 18 r2) € 
p~(g—r2) and (r2 18 r1) E€ P~ (8-71). 
The intuition is that there is an edge between two nodes in the graph of parents, iff it is possible to 
go from one parent to the other by first applying a reduction, and then applying an augmentation. 


An example is shown in Figure 1. 
We say that (p*,p7, f) is a dense augmentation schema for £ iff 


1. (p*,p~) is an augmentation schema for £ and f is a reduction translator. 
2. for every non-minimal graph g € £ it holds that either the graph of parents GoP,-;(g) is 


connected, or g — r is a minimal element of £ for all r € p7 (g) (in most cases, g —r = T). 
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Figure 1: On the bottom-left, a connected graph g is drawn. In the middle, there is the graph of 
parents GoP,- +. (e). For every of its vertices r;, the corresponding g — r; of g is drawn. For the 
edges {r1,ra4} and {r3,rs }, the common (grand)parent is drawn. Note that there is no edge between 
for example r3 and r4 as (g — r3) — (r4 ÎÈ r3) is not a connected graph. 


Let us return to the examples of graph classes. For the class of all graphs La, we define r1 ?Ẹ r2 = 
rı. It is easy to show that GoP,- ; (g) is a clique and that (p7,P;,1a) is a dense augmentation 
schema. We can also prove for connected graphs that (p4, p7, fe) is a dense augmentation schema 
(see the appendix for a proof). 

The main result of this paper is the following: 


Theorem 1 Given an augmentation operator p* and a predicate p. We can solve problem 1 with 
polynomial delay if the following conditions are satisfied: 


e there exists a reduction operator p~ and a reduction translator Î such that (p*,p~,1) is a 
dense augmentation schema; 


e the predicate p is monotonic w.r.t. pt over Lt} 
e př, p and pcan be evaluated in polynomial time in their arguments; 


e the number of vertices in V*(r) for every possible r resulting from p* and p~ is bounded by 
a constant. 


There are many classes of graphs for which these conditions are fulfilled. Below we give a 
non-exhaustive list of examples. 


5.1 Monotonic Classes 


Any predicate p that is monotonic w.r.t. edge and vertex deletion and that can be evaluated in 
polynomial time defines a class Lpy p that satisfies the conditions of Theorem 1, as (pt ,Pz>ta) isa 
dense augmentation schema, and all operations are polynomial. Hence, our algorithm can efficiently 
enumerate all these monotonic classes. 

An interesting special case are the minor-closed classes of graphs. The minors of a graph can 
be obtained by deleting edges, vertices, or identifying adjacent vertices into a single new vertex that 
is adjacent to all vertices that were adjacent to any of the identified original vertices. A class is 


minor-closed if for any graph g all its minors are also included in the class. Some classes of graphs 
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can be characterized by forbidden minors, that is, minors that none of the graphs are allowed to 
have. One example is the class of all planar graphs (the forbidden minors being the 5-clique K5 and 
the complete bipartite graph K33). It can be determined in polynomial time if a graph contains a 
given forbidden minor, and minor-closed classes of graphs are monotonic for p$. 


5.2 Connected Graphs 


It is proven in the appendix that the augmentation schema (p},p7, fe) is dense and its operators 
are polynomial. It follows that connected graphs can be enumerated with polynomial delay. Any 
predicate which is closed under edge and vertex deletion is also monotonic w.r.t. pt. Therefore, for 
any monotonic predicate p, our algorithm can list all connected graphs satisfying p with polynomial 
delay. 


5.3 Hereditary Classes with Bounded Degree 


A predicate p is hereditary iff for every graph gı such that p(g;) holds, p(g2) holds for any induced 
subgraph g of g1. As pointed out in Section 2, an induced subgraph must contain all edges between 
a selected set of nodes in the original graph. A hereditary predicate is monotonic w.r.t. the following 
augmentation operator: 


Pi (8) = {({¥new} {new v Fly EV}) | V Eva}, 


where again Vnew is a new vertex not in V (g). The corresponding reduction operator should remove 
every single vertex as well as all edges emanating from it. It can be shown that the resulting augmen- 
tation schema is dense. However, the augmentation operator p outputs a number of augmentations 
exponential in the size of the input. If we restrict ourselves to graphs with bounded degree, however, 
we can enumerate the resulting class with polynomial delay. 

One example is the class of claw-free graphs. A graph is called claw-free if the graph 


({v1,v2,¥3, Va}, {{v1, v2}, {v1 v3}, (v1, va} }) 


is not one of its induced subgraphs. Claw-freeness is not a monotonic property in £,+, as a subgraph 
of a graph without claws can contain a claw: for instance, consider the clique Ks; this graph does 
not have a claw as induced subgraph (all its induced subgraphs also being cliques), but many of its 
(ordinary) subgraphs are claws. 

In general, claw-freeness is a hereditary property, as every induced subgraph of a graph without 
claws, is claw-free. To the best of our knowledge no polynomial algorithm is currently known 
for enumerating all claw-free graphs even if we do allow for duplicates (Goldberg, 1992). Our 
algorithm partially solves this problem by allowing for listing all claw-free graphs with bounded 
degree. 


6. Automorphism Groups and Bases 


In order to avoid enumerating duplicates, we will use some theory on automorphism groups. In this 
section, we will briefly review the necessary concepts. 

An automorphism of a graph g is an isomorphism between g and itself. We will denote the 
identity automorphism with /,. In Figure 2, a graph g“ with 8 vertices is shown, together with 
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Figure 2: A graph g™ and 6 of its automorphisms. 


six of its automorphisms. An automorphism is a permutation of the vertices of g. The set of all 
automorphisms of g equipped with composition of permutations forms a permutation group acting 
on V(g). This group is called the automorphism group of g, which we denote with Aut(g). 

Let P be a permutation group and let S C P. We say S is a set of generators of P iff every 
element of P can be written as a composition of elements of S. We denote this fact with P =< 
S >. For example, in Figure 2, {@1,@2,3,@s5} is a (non-minimal) set of generators of Aut(g**). 
Automorphism @4 can be composed by 94 = ©3 0) 03. Ye can be composed as Mp = Ọ1 03 OQ. 

Let P be a permutation group acting on V and let v € V. The stabilizer of v in P is the subgroup 
P, ={9 € P| (v) =v}. Consider for example the automorphism group P = Aut(Cs) on the 5-cycle 
graph Cs = ({v1,v2,v3,v4,vs},{{v1, v2}, {v2, v3}, {v3, va}, {v4,vs}}). This group has 10 elements, 
each of which can be decomposed by (possibly) a mirror permutation {(v1,v1),(v2,v5), (vs,Vv2), 
(v3,v4),(v4,v3)} and rotations (0, 1 or more applications of {(v1,v2),(v2,v3),(V3,V4), (v4,v5), 
(v5,v1)}). For any vertex v € Cs, the stabilizer P, contains precisely 2 elements. E.g., P,, con- 
tains the identity permutation and the mirror {(v3,v3), (v4, v2), (v2, v4), (V1, v5), (V5, V1) } 

One can apply the definition of stabilizers recursively; we will denote with P,, ___, the stabilizer 
of vg in P,, 1, A sequence of points B = [v1 ...vn] of V is called a base for permutation group 
P iff P,,....»,_, only contains the identity permutation. For example, in Figure 2, Bı = [3,1,6,8] is a 
base for g“. Indeed, only the identity automorphism Je: leaves all four vertices 1, 3, 6 and 8 fixed. 
Also B2 = [3,1,6,8,2,4,5,7] is a base for g®. 

Let P be a permutation group, B a base of P and S C P. S is called a strong generating set related 
the abbreviation BSGS for a pair (B,S) where B is a base and S is a strong generating set related to 
B. 

Consider the base Bı = [3,1,6,8] for the group Aut(g) in Figure 2. We can construct a 
strong generating set by first choosing generators for Aut(g*)3 1.6.3, then choosing generators for 
Aut (g™)3 1, Aut(g)3,1,6 and so on until we have generators for Aut(g). Each time, we use the 
generators of the subgroup and extend them to a set of generators for the larger group. In our exam- 
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ple, {94} is a set of generators for Aut(g)3 1,6, Next, {@4,Qs5} is a set of generators for Aut(g)3 1 
and {@4,s5,@1} is a set of generators for Aut(g)3. Finally, {@4, 5, 1,93, 2} is a set of genera- 
tors for Aut(g**) and therefore a strong generating set for B. 

A BSGS can represent a permutation group acting on n elements using only O(nlog(n)) genera- 
tors. For example, even though in Figure 2, the automorphism group Aut(g®) contains 2 *3 «2 «3 x 
2 = 72 elements, only 5 generators are needed to represent it. Moreover, an O(n>) algorithm exists 
to transform a BSGS into a BSGS with a different base (Butler, 1991). In Figure 2, consider the 
BSGS ([3,1,6,8], {4,s,1,@3,2}) and suppose we want a strongly generating set for the base 
[3,1,8,6]. Then, we have to combine the generators of Aut(g)3 1,6 and Aut(g)3 1. One possible 
strongly generating set is {04 0 5,5, 1,93, Q2}. 

In general, it is easy to see that we can reduce any strong generating set for a permutation group 
on n elements to at most n(n — 1)/2 elements. Let B = {v,...,vg} be a base (k < n), and let S; 
(1 <i <k) be the subset of the strong generating set containing all generators fixing v; for all 
1 < j < i but mapping v; on a different element. Now for all i from 1 to k we can do the following. 
As long as any S; has more than n — i elements, there are two permutations p 1, p2 € S; such that 
pi(vi) = p2(v;), and we can remove pz from S; and add pz o Pi to S;+1. The permutation group 
generated by the new strong generating set U;S; remains the same, as any permutation requiring p2 
in its construction can still be constructed with p2 o pr’ o pı. After performing such replacements 
until all S; contain at most n — i elements, we have a strong generating set of size at most n(n — 1) /2. 

An important property is that given a strong generating set for some base, one can efficiently 
compute a strong generating set for another base. A basic step in such base change is to interchange 
two vertices, that is, given a strong generating set for B = {v1 ... vn}, find a strong generating set 
for BY = {v1 ...vj-1,V/41,V1,¥142---Vn} for some / with 1 <1 <n. Let S be a strong generating 
set with S = U’_,S; where S; contains the generators fixing v; (1 < j < i) and not fixing v;. The 
only part of S that should be changed when swapping v; and v;+; in the base are Sı and S;,,;. Let 
S' = SUS), UUies1141; Si be the strong generating set for the new base B’, where S, fixes v; 
(1 <i</J) and Sit fixes v; (1 <i < I) and vj4;. One can construct S; by ensuring it contains 
permutations that map v;+1 on all its possible images. These images can be found with a so-called 
reachability graph. This means one starts with a set of possible images J = {v;,,}, and as long as 
there is a permutation in U'_,S; which maps an element v € 7 on an element v’ ¢ I, one adds v’ to I; in 
the mean time, for each element v’ one maintains a corresponding permutation. After constructing 
S}, one can construct S}; by starting with S; 4, = S;USj41 and then removing any permutations 
from it that are redundant. In this way, it is possible to find in polynomial time a strong generating 
set for a base in which two vertices have been swapped. By iterating this procedure, it is possible 
to efficiently find a strong generating set for any new base. Note that we here only described one 
naive strategy. In the literature, much more advanced algorithms have been proposed, which allow 
to perform these operations significantly more efficiently. 


7. Algorithm Outline 


Our algorithm enumerates the graph ordered by size, that is, no graph will be output before all its 
ancestors under p* are listed. Indeed, we can show that if (p+,p~,1) is a dense augmentation 
schema, every graph has a unique size, that is, there is a function size that maps every graph to an 
integer such that for every g in Lp+ it holds that r € p*(g) implies that size(g +r) = size(g) + 1. 
This results allows us to order the graphs that we need to enumerate level-wise. 
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Superficially, the idea is then as follows. We maintain graphs that we are enumerating in a 
queue. Initially, this queue contains the graph T. Repeatedly, we pop a graph from the queue, apply 
the augmentation operator, and those children which are not equivalent to any graph in the queue, 
are pushed in the queue. Due to the level-wise enumeration, equivalent graphs must be in the queue. 
We need to address two issues: 


e how do we avoid that we insert a child which is equivalent to a child of another graph? 


e how do we avoid that we insert two children of the same graph that are equivalent with each 
other? 


To make these computations possible in polynomial time, we also keep all parents of the graphs 
in the queue in memory. For each graph, both those in the queue and their parents, we store the 
following information: 


e a representation for the graph g 


e for each augmentation (and reduction) r of g, we store r together with an isomorphism map- 
ping ọ = aug(g,r) (resp. Ọ = red(g,r)) such that ọ(g +r) (resp. (g —r)) is the stored 
representative of the isomorphism class of g +r (resp. g — r). Until we assign the aug(g,r) 
and red(g,r) variables with a value, we will assume them initialized aug(g,r) ='? and 
red(g,r) ='2'. If p(g +r) is false, with p the monotonic predicate of interest, we will as- 
sign aug(g,r) the value nil. 


e a base and strong generating set (BSGS) bsgs(g) of the automorphism group Aut(g). 


We output a graph when we pop it from the queue. Furthermore, we determine the BSGS at that 
point. The BSGS allows us to compute for each pair of augmentations of a graph if they result in 
equivalent graphs, and thus allows us to avoid two equivalent children of the same graph from being 
pushed in the queue. 

To avoid that two different graphs insert children in the queue that are equivalent, we make sure 
that the first parent of a child marks at least one augmentation in each of the other parents of the 
child. When this alternative parent is popped from the queue later, it can use this information to 
avoid pushing this child and all its equivalent children. 

To prove that this procedure is polynomial, we need to show that we can compute the BSGS in 
polynomial time, and that we can find all parents of a child in polynomial time. Let us start with 
this second point. 

One of the conditions of Theorem 1 is that the augmentation schema is dense, that is, that the 
graph of parents is connected; hence we can compute a spanning tree for the graph of parents. If we 
create a child, we know the parent that generated this child; by traversing the spanning tree starting 
from the reduction achieving this parent, we can traverse all possible reductions. At the same time, 
for every step we take in this spanning tree, we can determine a corresponding stored parent: every 
step in the graph of parents corresponds to a reduction followed by an augmentation, for which we 
have stored associated permutations that point to stored representatives. 

To compute the BSGS the key observation is that we can incrementally compute the BSGS of 
a graph from the BSGS of one of its parents, in a similar way to the algorithm of Goldberg (1992). 
Given a child and its parent, we perform a base change for the parent, such that we obtain permu- 
tations in which the vertices contained in the augmentation are stabilized. This gives generators 
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for all vertices in the child, except those contained in the augmentation. By traversing all parents 
(as computed when the child was created), we can determine permutations and images for these 
vertices as well. The resulting set of generators can be reduced to a BSGS in polynomial time. 

Details of this algorithm, including optimizations and proofs of complexity, can be found in the 
appendix. 


8. Conclusions 


We introduced an algorithm for listing graphs that makes sure that no two equivalent graphs are 
being output. We showed that for a well-defined set of conditions on a class of graphs to enumerate, 
this algorithm is correct and achieves polynomial delay. Classes of graphs that can be enumerated 
with low run-time complexity are connected graphs, planar graphs, minor closed graphs, mono- 
tonic classes of graphs in general, and hereditary classes with bounded degree. To the best of our 
knowledge, this is the first algorithm to be general enough to be able to list this range of classes 
efficiently, and for several of these classes no polynomial delay algorithm was presented before. In 
the appendix, we show that our algorithm runs with delay O(n>) for the class of all graphs, which 
is an improvement over the known method of Goldberg (1992), which achieves a delay of O(n°), 
where n is the number of vertices in the largest graph that is listed. 

Most pattern mining algorithms consist of a candidate generation part and an interestingness 
evaluation part. This work contributes to the theory of pattern mining by providing a polynomial- 
delay algorithm for the first of these two common tasks. Also, our algorithm can be used as a generic 
candidate pattern generator for a wide range of algorithms for mining structured patterns, avoiding 
the need to research specialized canonical forms and enumeration strategies. 

In contrast to other graph pattern mining systems, our algorithm provides at the same time a 
data structure in which one can look up a pattern in polynomial time. Indeed: given a pattern g, 
one can construct g from the empty graph by a sequence of augmentations, and then follow the 
augmentation pointers through the data structure. Efficient lookup could be very useful when the 
set of patterns resulting from a pattern mining step is queried by the user or by algorithms taking 
this set of patterns as input. 

Even though its run-time complexity is favorable, the space complexity of our algorithm is an 
issue. The space required is polynomial in the size of the output; given that the number of listed 
graphs can be exponential, the storage requirements for large classes of graphs can be exponential. 
However, in those applications where the listed graphs need to be stored anyway, such as applica- 
tions in data analysis, this drawback is of minor concern. Moreover, we know of no other general 
approaches that obtain a better space complexity. 

As future work, we conjecture that the run-time complexity of our algorithm can be reduced 
further, at least to a delay of O(n*) for the class of all graphs and the class of connected graphs, 
through the definition of a canonical representation over the graphs. Furthermore, we hope to re- 
duce the space requirements of our algorithm in problem settings where not all listed graphs are 
required to be stored, and plan to implement it in concrete data mining systems. Finally, relieving 
the constraint of bounded degree for hereditary classes remains an interesting problem. 
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Appendix A. Algorithmic Details and Proofs 


In this appendix we provide details of our algorithm and proofs for our results. 


A.1 Dense Augmentation Schemas 


An important property of a dense augmentation schema is the following: 


Lemma 2 Let L be a class of graphs, and let (p* ,p~,1) be a dense augmentation schema for L. 
Then, there exists a function size : L — N such that Vg € L,Yr € pt (g) : size(g +r) = size(g) +1. 


Proof We will say that a graph g can be constructed from T in n steps if there exists a sequence 
T = 20,21,---;8n =g such that g;;; = g; +r; for some r; E€ p*(g;) fori=O0...n—1. 

We first show that there is no graph g which both can be constructed from T in nı steps and can 
be constructed from T in m steps for two distinct numbers n; and n2. 

Assume that such a graph g exists, and consider a minimal such graph g (according to the 
order induced by the augmentation schema). Then, there exists a parent g — r} of g which can 
be constructed from T in nı — 1 steps, and a parent g — r4 of g which can be constructed from 
T in m-— 1 steps. As the graph of parents GoP)- ;(g) of g is connected, there must exist two 
r1,r2 © Pp (g) such that (71,72) is an edge of GoP,- ;(g), g—rı can be constructed from T in 
n, steps, g — rz can be constructed from T in n, steps and n| #n}. As (r1,r2) is an edge of 
GoP,- 1(8), (rı T8 r2) € P(g — r2) and (r2 T8 r1) € p7 (g — rı), there exists some graph p such that 
p = (8—r2)— (rı 18 r2) and p ~ (g — r2) — (r2 ÎE r1). Let np be the number of steps needed to 
construct p from T. Then, remembering that augmentation schemas are isomorphism-invariant, we 
can conclude that both g — rı and g — r2 can be constructed from T in np + 1 steps. Now as nı Æ n2 
either nı — 1 Æ#np+ 1 ornz—1+#np+1. Without loss of generality we can assume nı Æ np +1. This 
means that g — rı can be constructed from T in both np + 1 and nı — 1 steps, which is a contradiction 
with the assumption that g is a minimal graph for which it holds that it can be constructed from T 
in two different numbers of steps. 





Therefore, we conclude that no such graph exists. Hence, in order to obtain a function size 
satisfying the requirement, one can define size( T) to be 0 and for every g, size(g) to be the number 
of steps in which g can be constructed from T. a 


In Section 5 we stated the following. 


Lemma 3 (p7,p7, Te) is a dense augmentation schema. 
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Proof We consider the different elements of the definition of a dense augmentation schema. 

First, remember that (p} , p7 ) was defined by Equation (2) and (3) (see page 913 and 914). This 
is clearly an augmentation schema. Also, we defined rı te r2 to be equal to r1, except in the case 
where rı = (0, {v,u1}), r2 = (0, {v,u2}) and v has degree 2. In that case rı 1È r2 = rı U({v}, {}). It 
is easy to see that 7, satisfies Equations (4), (5) and (6), and hence is a reduction translator. 

It remains to be shown that for every non-minimal graph g € Le it holds that either the graph of 
parents GoP,- » (g) is connected, or g — r is the minimal element of £e for all r € p~ (g). 

Consider a connected graph g with at least 2 edges. We prove that GoP,- 7, (g) is connected. 
If g is a tree, then every reduction in p™ (g) removes a leaf and the edge adjacent to it. By the 
definition of T., r1 ÌÈ r2 = rı and r1 € p7 (g — r2) will hold for any distinct r},r2 € p7. So if g is a 
tree, GoP,- » (g) is a clique. 

If g is not a tree it contains at least one simple cycle C. We can partition p7 (g) into two sets R1 
and Rz such that R; contains all reductions removing an edge from the cycle C, and R contains all 
other reductions. For any rı € Rj and r2 € Ro, it holds that removing r2 from g — rı does not discon- 
nect g and vice-versa. So such two rı and rz are adjacent in GoP,- ; (g). Therefore, GoP,- ».(g) 
is certainly connected if #R2 > 1. On the other hand, if #R2 = 0, g is a simple cycle. In that case, 
two reductions of R4 are adjacent iff they remove two adjacent edges of the cycle. So in that case, 
GoP,- +.(g) is a cycle and hence connected. a 


A.2 Algorithm 


In this section we explain in more detail our algorithm. As stated in Section 7, for each graph, we 
store the following information: 


e a representation for the graph g 


e for each augmentation and reduction r of g, we store r together with an isomorphism mapping 
aug(g,r) (resp. red(g,r)) that maps g +r (resp. g — r), to the stored representative of their 
isomorphism class. In the algorithm, we use aug(g,r) and red(g,r) as functions that return 
the isomorphism. When the isomorphism is not yet computed, aug(g,r) and red(g,r) will 
return '?’. If p(g +r) is false, aug(g,r) returns the value nil. 


e a base and strong generating set (BSGS) bsgs(g) of the automorphism group Aut (g). 


Algorithm 2 shows the high level algorithm. It repeatedly takes an unprocessed graph g +r and 
processes it; g +r must be minimal according to the size function from Lemma 2. A graph which 
has been processed remains in memory till it is no longer needed in the computation for one of its 
ancestors. 

As shown by Algorithm 3, the processing of a graph g includes computing a BSGS and the 
automorphism group of g, computing reachability graphs of Aut(g) (line 3), finding isomorphic 
variants of children (lines 8-12), and examining all other children of g including constructing all 
red(-,-) and some aug(-,-) links (lines 13-24). We will describe each of the steps of the algorithm 
below. 

Let us first consider what is known at the point PROCESS_GRAPH(g,/9) is called. First, all 
graphs h for which p(h) and size(h) < size(g) have been processed and outputted, and the values 
for aug(h,-), red(h,-) and bsgs(h) have been computed. Second, all graphs f for which p(f) and 
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Algorithm 2 Highlevel algorithm 





Require: a graph class £, a monotonic predicate p and a dense augmentation schema (p*,p~, 1) 
Ensure: output all g € £ with p(g) 


1: Ref Queue — {(T,(0,0))} 

2: while Ref Queue + 0 do 

3: Let (g,r) € Ref Queue such that size(g +r) is minimal 
4 Re f Queue — Ref Queue \ {(g,r)} 
5 PROCESS_GRAPH(g,r) 
6: Output g 
7: end while 





size(f) = size(g) have either entered RefQueue or are fully processed. In any case, all values 
red(f,-) have been computed. 

Let Sy denote the bound on the number of vertices and edges that can occur at most in any 
augmentation r € pt (g) with g € L. Then we have the following lemma. 


Lemma 4 At line 2 of Algorithm 3 one can compute a BSGS for Aut(g) in time O(|V(g)|> +|V(g) |? 
IP (g)| -Sv!). 


Proof The case g = T is trivial, so we assume g #4 T and ro E€ p (g). A parent g — ro is known, and 
so is bsgs(g — ro) (as size(g — ro) = size(g) — 1 and hence g — ro has been fully processed). By per- 
forming a base change, we can obtain a BSGS for the stabilizer Aut ( 8)V*(r) which fixes all elements 
in V*(ro); please note that V*(r) contains all nodes involved in the reduction. Hence, we compute a 
base in which all nodes involved in the reduction are fixed. This is possible in time O(|V(g)|>). We 
can then extend the set of generators corresponding to this BSGS to a set of generators for Aut(g) 
by adding for every coset of Aut(g)y+(;)) in Aut(g) one representative. Each such representative Q 
maps ro on some different (ro). Graphs g — ro and g — @(ro) are isomorphic, which is reflected 
by red(g,ro)(g — ro) = red(g,@(ro))(g — (ro) ); here, red(g,ro) returns the stored permutation for 
reduction ro on graph g, which after application on the nodes and edges in graph g — rọ yields the 
same graph as for the equivalent reduction (79). One can therefore find all such representatives 
by iterating over all r € p~ (g), checking whether red(g,ro)(g — ro) = red(g,r)(g — r) (all red(g,-) 


were computed earlier) and for all of these listing all possibilities to extend red(g,r)~! o aug(g,r0) 


to an automorphism (red(g,r)~! oaug(g,ro)) U Qo (where Qo : V(r) — V (ro)). In total, the number 
of representatives of cosets of Aut(g)y+(;) is bounded by |p~ (g)|-Sy!. One can eliminate the re- 


dundant automorphisms in this list in time O(|V(g)|> -|p~ (g)| -Sv!). a 


Line 3 of Algorithm 3 computes reachability graphs Q;, which are used in line 9 of the algo- 
rithm to determine if two augmentations yield isomorphic graphs. These graphs help to exploit the 
knowledge of the just computed BSGS bsgs(g) of Aut(g). A reachability graph is computed for the 
number of nodes of graph g involved in the augmentation. In the case of (un)connected graphs, an 
augmentation involves either one or two nodes of graph g, and hence a reachability graph is used 
for i= 1 or i = 2. Q; can be computed in time O(i- |V(g)|'- |bsgs(g)|). As for a reduced BSGS we 
have |bsgs(g)| < |V(g)|?, line 3 can be performed in time O(Sy - |V (g) |5 +2). 
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Algorithm 3 Processing one graph 





1: procedure PROCESS_GRAPH(g,ro) 

2: bsgs(g) — Compute_BSGS(g, ro) 

3: Ensure reachability graph Q; exists for g where i = |V* (ro) NV(g)], 
V(Qi) = (V(g))', E(Qi) = {(W, 9(W)) |W E V (Qi) ^g € bsgs(g)} 

4: Riso — {r € p*(g) | aug(g,r) £} 

5; done — FALSE 

6: while not (done) do 

7 if Riso+ AO then 

8 

9 








Let r € Riso+ 

for all r’ € p*(g) s.t. dQ: (Vv EV(g): Pv) EV(g))Agtreegtr' do 
io: aug(g,r’) — aug(g,r) og"! 
11: Riso+ — Riso+ \ {r} 
12: end for 
13: else if Sr € p*(g) : aug(g,r) ='?' then 
14: S=g+r 
15: (ok,s_par_list) — SEARCH_PARENTS(g,r,5) 
16: if ok ^ p(g) then 
17: for all (r’,') € s_par_list do 
18: f—@(s—r) ; augl f, o'r) = 9" ; red(s,r’) = 9! 
19: Re f Queue = Ref Queue U {(s,r)} 
20: else 
21: for all (7’,@’) € s_par_list do 
22: aug(@'(s—r’),Q'(r’)) — nil 
23: end if 
24: Risot — Riso+ U {r} 
25: else 
26: done — TRUE 
27: end if 
28: end while 


29: end procedure 





It is possible that previous calls of PROCESS_GRAPH have assigned a value already to some 
of the aug(g,-), and line 4 collects these augmentations so that their isomorphic variants can be 
computed in lines 8-12. 

Next, the while loop at line 6 runs until all aug(g,-) values have been computed. As soon as a 
new child is identified (either by previous calls to PROCESS_GRAPH (line 4) or by newly examined 
children (line 24) it is added to Riso+, and in the next iteration all its isomorphic variants are com- 
puted. If Riso+ is empty, the r € p*(g) for which aug(g,r) ='?' are isomorphic to none of the r’ for 
which aug(g,r’) has already been assigned a value and a new child is considered (lines 13-24). 

We will first discuss the computation of isomorphic variants of an r € Riso+ in line 9. Recalling 
the definition, all 7’ are searched for which there is a mapping @ such that (Vv € V(g) : Q(v) € 
V(g)) Ag +r xo g+r’. One can do this as follows. First, use the reachability graph Q; (with i = 
|V*(r) NV(g)|) to find all possible images of V*(r) OV (g) under the automorphism group Aut(g). 
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Algorithm 4 search_parents 





1: procedure SEARCH_PARENTS(g, ro, $S) 

2 Construct a spanning tree T for GoP,- ;(s), the graph of parents of s 
3 s_par list — {(r0,I,)}} 

4 Perform a depth-first search of T, starting at ro 

5: for all edges (r1,r2) visited during the depth first search do 
6 Ọ — GET_PARENT(s, 11,72) 

7 if ọ = nil then return (FALSE,s_par_list) 

8 s_par_list — s_par_list U{(r2,@)} 

9 end for 

10: return (TRUE,s_par_list) 

11: end procedure 





Algorithm 5 Get one parent 





1: procedure GET_PARENT(S,/1,/2) 

2 Qı — ext(s_par_list(s,r1),1s) 

3: fis fi(s—11) 

4: Qp ext(red(fi, @pi(r2 1° ri)) M1) 

5: p= Qp((s—r1) — pi (r2 T7 r1)) 

6 Let ri € p*(p) and Qx : V (Ọp(r1 TS r2)) — V (ri) such that (1, U@.)(@p(ri 1° r2)) =r} 
7 if aug(p,r,) = nil then return nil 

8: else return ext (aug(p,1,), Ip UQx) 0 @p) 

9: end procedure 





For each of these images of r, one can also compute one automorphism @, E€ Aut(g) under which 
P(r) corresponds to the image, by following the edges of Q;. Then, one can check for every 
r’ € p*(g) whether there is a mapping @, : V(r) — V(r’) such that for @ = @, U@, we have @(r) =r". 
Traversing Q; and computing the automorphisms @, along the way is possible in time O(|V(g)|'*'). 
As i < Sy and we do this at most once for every r € p7 (g), the total time spent here is bounded by 
O(\V(g) >’ p+ (8)]). 

Let us now consider the investigation of new children in lines 13-24 of Algorithm 3. For a 
particular child s = g +r, the algorithm first searches the parents f = s—7’ for all r’ € p7 (s). As we 
explain below, if all parents of s satisfy the predicate p the algorithm is guaranteed to find all these 
parents. The aug(f,r’) variables have been assigned a value and depending on whether p holds for 
s itself, the red(s,r’) variables have been assigned a value and it is added to the data structures and 
to Ref Queue. 

Finding other parents of a proposed child s = g +r is detailed in Algorithms 4 and 5. Before 
analysing this procedure and its consequences, we first explain some basic ideas. The key intuition 
that enables an efficient enumeration is that we can efficiently identify the (already enumerated) 
parents f2 of the candidate graph s together with a suitable isomorphism mapping @ such that s — 
T2 ~@ f2 for every reduction r2 € p(s) by using the knowledge from Equation (6) that one can 
obtain p = (s — rı) — (r2 T° r1) also by removing the parts in a different order: p = (s — r2) — 
(rı 1° r2). Therefore, in Algorithm 5 we first remove some rı € p7 (s) to obtain a known parent fi 
of s, then go to a grand-parent p by removing a (translated) r2 from fı, and then go downwards 
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Figure 3: Searching for parents 


again from p to the unknown parent fọ (see Figure 3 for an illustration). In order to construct an 
isomorphism between s — rz and f2, isomorphisms between s — rı and fı, between fi — (r2 1$ r1) 
and p and between p+ (rı T° r2) and fp are first extended to cover the full set of vertices of s (the 
ext(-,-) function), and then composed (line 8). 

In Algorithm 5 we have the following notation. Let @2 and @, be bijections between sets of 
vertices. Then, ext(@2,@1) is a bijection that maps any x for which @ (x) is in the domain of @2 on 
2(@1(x)) and maps any other x on a new vertex. 

Now we return to the details of Algorithms 4 and 5. First, remember that for a dense augmen- 
tation schema, the graph of parents of a particular graph is connected. Therefore, it is possible in 
line 2 of Algorithms 4 to construct a spanning tree for it. Algorithm 4 attempts to construct in 
s_par_list a mapping from reductions r € p~ (s) to isomorphism mappings between the graphs s — r 
and the representatives of their isomorphism class which were output before. We know already the 
isomorphism mapping between s — rp = g and the representative of its isomorphism class, which 
is g itself (line 3). Now, if for some rı € p(s) we know an isomorphism mapping between s — rı 
and its representative f, and if for some other r2 € p7, the graph s — r satisfies p and has been 
listed earlier, then Algorithm 5 will provide us with an isomorphism mapping between s — r2 and its 
representative f) as discussed above. 

Then there are two possible cases. On the one hand, if there is a parent of s which did not fulfil 
the predicate p and hence was not listed earlier, we will not find that parent: line 7 of Algorithm 
5 will detect this and return nil, which will cause also Algorithm 4 to return FALSE. On the other 
hand, if all parents of s fulfil predicate p, the search along the spanning tree will eventually provide 
an isomorphism mapping between s and the representatives of s — r for all r € p7 (s). 

The complexity of this search can be assessed as follows: Algorithm 5 is executed once for 
every r’ € p7 (s) and contains operations on permutations that can be performed in time O(|V (s)|). 
Assuming that | can be performed efficiently, that a good data structure is built on p*(p) in order 
to be able to perform line 6 of Algorithm 5 efficiently, that |p~ (s)| can be bounded by O(|p~ (e)l) 
and that |V(s)| can be bounded by O(|V(g)|), we can therefore conclude that Algorithm 4 can be 
performed in time O(|p~ (g)|-|V(g)|). These assumptions are not very strong and hold for our two 
example augmentation schemas (97,7, ta) and (p7,p7,1.). Algorithm 4 is ran at most once for 
every r € p+ (g), for a total complexity of O(|p*(g)|-|V(g)|-|p~ (g)|). 

In summary, this appendix has provided an informal proof of the following. 


Theorem 5 Under the assumptions stated in Theorem 1, Algorithm 2 correctly lists for every iso- 
morphism class of graphs g for which p(g) = TRUE exactly one representative, and the time needed 
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for outputting the next graph g is bounded by O(|V(g)|° + |V(g)|> = |p” (g)| “Sv! + Sv -|V(g)|S¥ 4? + 
IV (8) + p+ (8) + |p*(e)1-IV(e)I- IP ()1)- 


This bound is polynomial for a constant Sy. 

For example, for enumerating connected graphs with the schema (p7,p7,1-), Sy = 2. There- 
fore, this algorithm enumerates classes of connected graphs in time O(|V(g)|>) for each output 
graph g in the worst case. A similar result can be shown for enumerating (a monotonic subset of) 
all graphs with (p7,p7, fa). 

Our conjecture is that this complexity can be improved further to O(|V(g)|*). 
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